knitr document van Steensel lab
TF reporter cDNA-count processing - K562
Introduction
I previously processed the raw sequencing data, optimized the barcode clustering, quantified the pDNA data and normalized the cDNA data. In this script, I want to have a detailed look at the cDNA data from a general perspective.
Analysis
First insights into data distribution - reporter activity distribution plots
Heat map - display mean log2-activity for each TF in each condition
Heatmap for native enhancers
Run FIMO script again
# motfn=/home/f.comoglio/mydata/Annotations/TFDB/Curated_Natoli/update_2017/20170320_pwms_selected.meme
# odir=/home/m.trauernicht/mydata/projects/tf_activity_reporter/data/SuRE_TF_1/results/native-enhancer/fimo
# query=/home/m.trauernicht/mydata/projects/tf_activity_reporter/data/SuRE_TF_1/results/native-enhancer/cDNA_df_native.fasta
# nice -n 19 fimo --no-qvalue --thresh 1e-4 --verbosity 1 --o $odir $motfn $query load fimo results
We built a TF motif matrix using -log10 transformed FIMO scores. We used this feature encoding throughout the rest of this analysis, unless otherwise stated.
visualize fimo results
Look at only expressed TFs in mESCs
Filter expressed TFs
Use FIMO matrix to build loglinear model
Binary presence of motif to explain expression variance
Heatmap per TF - comparing design activities mutated vs. non-mutated
Heatmap per TF - only WT TF activities
Compute activity changes relative to their negative controls
All of these heatmaps conclude that there we have informative reporters for ~10 TFs, and that the TF reporter design matters for some but not all TFs
SuperPlot of TF activity per condition - this way we can plot not only the mean, but the complete data distribution across technical and biological replicates
SuperPlots comparing different designs
Log-linear expression modelling to explain variance - model for each TF
Can expression variance be explained by the TF properties?
Session Info
paste("Run time: ",format(Sys.time()-StartTime))## [1] "Run time: 56.5008 secs"
getwd()## [1] "/DATA/usr/m.trauernicht/projects/SuRE-TF/gen-1_K562"
date()## [1] "Fri Mar 12 13:45:52 2021"
sessionInfo()## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.7 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.0.0 stringr_1.4.0 readr_1.3.1 GGally_1.5.0
## [5] gridExtra_2.3 cowplot_1.0.0 plyr_1.8.6 viridis_0.5.1
## [9] viridisLite_0.3.0 ggforce_0.3.1 ggbeeswarm_0.6.0 ggpubr_0.2.5
## [13] magrittr_1.5 pheatmap_1.0.12 tibble_3.0.1 maditr_0.6.3
## [17] dplyr_0.8.5 ggplot2_3.3.0 RColorBrewer_1.1-2
##
## loaded via a namespace (and not attached):
## [1] prettydoc_0.4.0 beeswarm_0.2.3 tidyselect_1.1.0 xfun_0.19
## [5] purrr_0.3.3 lattice_0.20-38 splines_3.6.3 colorspace_1.4-1
## [9] vctrs_0.2.4 htmltools_0.5.0 mgcv_1.8-31 yaml_2.2.1
## [13] rlang_0.4.8 pillar_1.4.3 glue_1.4.2 withr_2.1.2
## [17] tweenr_1.0.1 lifecycle_0.2.0 munsell_0.5.0 ggsignif_0.6.0
## [21] gtable_0.3.0 evaluate_0.14 labeling_0.3 knitr_1.30
## [25] vipor_0.4.5 Rcpp_1.0.5 scales_1.1.0 farver_2.0.1
## [29] hms_0.5.3 digest_0.6.27 stringi_1.5.3 polyclip_1.10-0
## [33] grid_3.6.3 tools_3.6.3 crayon_1.3.4 pkgconfig_2.0.3
## [37] Matrix_1.2-18 ellipsis_0.3.0 MASS_7.3-51.5 data.table_1.12.8
## [41] assertthat_0.2.1 rmarkdown_2.5 reshape_0.8.8 R6_2.5.0
## [45] nlme_3.1-143 compiler_3.6.3